AITopics | source document

Collaborating Authors

source document

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

9b6d7202750e8e32cd5270eb7fc131f7-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 00:06:11 GMT

information, summarization, summarization model, (17 more...)

Neural Information Processing Systems

Country:

South America > Ecuador (0.14)
North America > Costa Rica (0.14)
Europe > Belgium (0.04)
South America > Brazil (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

552ef803bef9368c29e53c167de34b55-Supplemental-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-9-2026, 01:45:01 GMT

For what purpose was the dataset created?Was therea specific task in mind? Was there aspecific gap that needed to be filled? Please provide a description.The Multi-LexSum dataset was curated to facilitate the development of automaticsummarization methods for civil rights lawsuits.Recent advances in document summarization have led to impressive results in generating ashort description for passages typically in hundreds of words. However, the source inputs forsummarizing civil right lawsuits are considerably longer: they can contain up to 70k words onaverage.

artificial intelligence, dataset, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Michigan (0.05)
North America > United States > Nevada (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
North America > United States > New York (0.04)

Industry:

Law > Litigation (1.00)
Law > Civil Rights & Constitutional Law (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

HalluGraph: Auditable Hallucination Detection for Legal RAG Systems via Knowledge Graph Alignment

Noël, Valentin, Seidou, Elimane Yassine, Capo-Chichi, Charly Ken, Amari, Ghanem

arXiv.org Artificial IntelligenceDec-2-2025

Legal AI systems powered by retrieval-augmented generation (RAG) face a critical accountability challenge: when an AI assistant cites case law, statutes, or contractual clauses, practitioners need verifiable guarantees that generated text faithfully represents source documents. Existing hallucination detectors rely on semantic similarity metrics that tolerate entity substitutions, a dangerous failure mode when confusing parties, dates, or legal provisions can have material consequences. We introduce HalluGraph, a graph-theoretic framework that quantifies hallucinations through structural alignment between knowledge graphs extracted from context, query, and response. Our approach produces bounded, interpretable metrics decomposed into \textit{Entity Grounding} (EG), measuring whether entities in the response appear in source documents, and \textit{Relation Preservation} (RP), verifying that asserted relationships are supported by context. On structured control documents, HalluGraph achieves near-perfect discrimination ($>$400 words, $>$20 entities), HalluGraph achieves $AUC = 0.979$, while maintaining robust performance ($AUC \approx 0.89$) on challenging generative legal task, consistently outperforming semantic similarity baselines. The framework provides the transparency and traceability required for high-stakes legal applications, enabling full audit trails from generated assertions back to source passages.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.01659

Genre: Research Report (0.64)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

Guo, Pei-Fu, Tsai, Yun-Da, Hsu, Chun-Chia, Chen, Kai-Xin, Tsai, Ya-An, Chang, Kai-Wei, Peng, Nanyun, Yeh, Mi-Yen, Lin, Shou-De

arXiv.org Artificial IntelligenceNov-25-2025

Evaluating cross-lingual knowledge transfer in large language models is challenging, as correct answers in a target language may arise either from genuine transfer or from prior exposure during pre-training. We present LiveCLKTBench, an automated generation pipeline specifically designed to isolate and measure cross-lingual knowledge transfer. Our pipeline identifies self-contained, time-sensitive knowledge entities from real-world domains, filters them based on temporal occurrence, and verifies them against the model's knowledge. The documents of these valid entities are then used to generate factual questions, which are translated into multiple languages to evaluate transferability across linguistic boundaries. Using LiveCLKTBench, we evaluate several LLMs across five languages and observe that cross-lingual transfer is strongly influenced by linguistic distance and often asymmetric across language directions. While larger models improve transfer, the gains diminish with scale and vary across domains. These findings provide new insights into multilingual transfer and demonstrate the value of LiveCLKTBench as a reliable benchmark for future research.

benchmark, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2511.14774

Country:

Asia (0.68)
North America > United States > California (0.28)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (1.00)
Media (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

SemanticCite: Citation Verification with AI-Powered Full-Text Analysis and Evidence-Based Reasoning

Haan, Sebastian

arXiv.org Artificial IntelligenceNov-21-2025

Effective scientific communication depends on accurate citations that validate sources and guide readers to supporting evidence. Yet academic literature faces mounting challenges: semantic citation errors that misrepresent sources, AI-generated hallucinated references, and traditional citation formats that point to entire papers without indicating which sections substantiate specific claims. We introduce SemanticCite, an AI-powered system that verifies citation accuracy through full-text source analysis while providing rich contextual information via detailed reasoning and relevant text snippets. Our approach combines multiple retrieval methods with a four-class classification system (Supported, Partially Supported, Unsupported, Uncertain) that captures nuanced claim-source relationships and enables appropriate remedial actions for different error types. Our experiments show that fine-tuned lightweight language models achieve performance comparable to large commercial systems with significantly lower computational requirements, making large-scale citation verification practically feasible. The system provides transparent, evidence-based explanations that support user understanding and trust. We contribute a comprehensive dataset of over 1,000 citations with detailed alignments, functional classifications, semantic annotations, and bibliometric metadata across eight disciplines, alongside fine-tuned models and the complete verification framework as open-source software. SemanticCite addresses critical challenges in research integrity through scalable citation verification, streamlined peer review, and quality control for AI-generated content, providing an open-source foundation for maintaining citation accuracy at scale.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2511.16198

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.89)

Add feedback

Stress Testing Factual Consistency Metrics for Long-Document Summarization

Mujahid, Zain Muhammad, Wright, Dustin, Augenstein, Isabelle

arXiv.org Artificial IntelligenceNov-12-2025

Evaluating the factual consistency of abstractive text summarization remains a significant challenge, particularly for long documents, where conventional metrics struggle with input length limitations and long-range dependencies. In this work, we systematically evaluate the reliability of six widely used reference-free factuality metrics, originally proposed for short-form summarization, in the long-document setting. We probe metric robustness through seven factuality-preserving perturbations applied to summaries, namely paraphrasing, simplification, synonym replacement, logically equivalent negations, vocabulary reduction, compression, and source text insertion, and further analyze their sensitivity to retrieval context and claim information density. Across three long-form benchmark datasets spanning science fiction, legal, and scientific domains, our results reveal that existing short-form metrics produce inconsistent scores for semantically equivalent summaries and exhibit declining reliability for information-dense claims whose content is semantically similar to many parts of the source document. While expanding the retrieval context improves stability in some domains, no metric consistently maintains factual alignment under long-context conditions. Finally, our results highlight concrete directions for improving factuality evaluation, including multi-span reasoning, context-aware calibration, and training on meaning-preserving variations to enhance robustness in long-form summarization. We release all code, perturbed data, and scripts required to reproduce our results at https://github.com/zainmujahid/metricEval-longSum.

computational linguistic, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.07689

Country:

North America > United States (1.00)
Europe (1.00)
Asia (1.00)

Genre: Research Report > New Finding (1.00)

Industry: Law (0.49)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)

Add feedback

LLM-Supported Formal Knowledge Representation for Enhancing Control Engineering Content with an Interactive Semantic Layer

Fiedler, Julius, Knoll, Carsten, Röbenack, Klaus

arXiv.org Artificial IntelligenceNov-5-2025

The rapid growth of research output in control engineering calls for new approaches to structure and formalize domain knowledge. This paper briefly describes an LLM-supported method for semi-automated generation of formal knowledge representations that combine human readability with machine interpretability and increased expressiveness. Based on the Imperative Representation of Knowledge (PyIRK) framework, we demonstrate how language models can assist in transforming natural-language descriptions and mathematical definitions (available as LaTeX source code) into a formalized knowledge graph. As a first application we present the generation of an ``interactive semantic layer'' to enhance the source documents in order to facilitate knowledge transfer. From our perspective this contributes to the vision of easily accessible, collaborative, and verifiable knowledge bases for the control engineering domain.

artificial intelligence, large language model, natural language, (13 more...)

arXiv.org Artificial Intelligence

2511.02759

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.74)

Add feedback

Leveraging Hierarchical Organization for Medical Multi-document Summarization

Hsu, Yi-Li, Mei, Katelyn X., Wang, Lucy Lu

arXiv.org Artificial IntelligenceNov-5-2025

Medical multi-document summarization (MDS) is a complex task that requires effectively managing cross-document relationships. This paper investigates whether incorporating hierarchical structures in the inputs of MDS can improve a model's ability to organize and contextualize information across documents compared to traditional flat summarization methods. We investigate two ways of incorporating hierarchical organization across three large language models (LLMs), and conduct comprehensive evaluations of the resulting summaries using automated metrics, model-based metrics, and domain expert evaluation of preference, understandability, clarity, complexity, relevance, coverage, factuality, and coherence. Our results show that human experts prefer model-generated summaries over human-written summaries. Hierarchical approaches generally preserve factuality, coverage, and coherence of information, while also increasing human preference for summaries. Additionally, we examine whether simulated judgments from GPT-4 align with human judgments, finding higher agreement along more objective evaluation facets. Our findings demonstrate that hierarchical structures can improve the clarity of medical summaries generated by models while maintaining content coverage, providing a practical way to improve human preference for generated summaries.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.23104

Country: North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

A Graph-based RAG for Energy Efficiency Question Answering

Campi, Riccardo, Vago, Nicolò Oreste Pinciroli, Giudici, Mathyas, Rodriguez-Guisado, Pablo Barrachina, Brambilla, Marco, Fraternali, Piero

arXiv.org Artificial IntelligenceNov-4-2025

In this work, we investigate the use of Large Language Models (LLMs) within a graph-based Retrieval Augmented Generation (RAG) architecture for Energy Efficiency (EE) Question Answering. First, the system automatically extracts a Knowledge Graph (KG) from guidance and regulatory documents in the energy field. Then, the generated graph is navigated and reasoned upon to provide users with accurate answers in multiple languages. We implement a human-based validation using the RAGAs framework properties, a validation dataset comprising 101 question-answer pairs, and domain experts. Results confirm the potential of this architecture and identify its strengths and weaknesses. Validation results show how the system correctly answers in about three out of four of the cases (75.2 +- 2.7%), with higher results on questions related to more general EE answers (up to 81.0 +- 4.1%), and featuring promising multilingual abilities (4.4% accuracy loss due to translation).

information, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-97207-2_4

2511.01643

Country: Europe > Italy (0.15)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Energy > Renewable (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Citation Failure: Definition, Analysis and Efficient Mitigation

Buchmann, Jan, Gurevych, Iryna

arXiv.org Artificial IntelligenceOct-24-2025

Citations from LLM-based RAG systems are supposed to simplify response verification. However, this does not hold for citation failure, when a model generates a helpful response, but fails to cite complete evidence. In contrast to previous work, we propose to disentangle this from response failure, where the response itself is flawed, and citing complete evidence is impossible. To address citation failure, this work follows a two-step approach: (1) We study when citation failure occurs and (2) how it can be mitigated. For step 1, we extend prior work by investigating how the relation between response and evidence affects citation quality. We introduce CITECONTROL, a benchmark that systematically varies this relation to analyze failure modes. Experiments show that failures increase with relational complexity and suggest that combining citation methods could improve performance, motivating step 2. To improve LLM citation efficiently, we propose CITENTION, a framework integrating generative, attention-based, and retrieval-based methods. Results demonstrate substantial citation improvements on CITECONTROL and in transfer settings. We make our data and code publicly available.

computational linguistic, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2510.20303

Country:

Europe (1.00)
Asia (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback